Handling adversarial concept drift in streaming data
نویسندگان
چکیده
Classifiers operating in a dynamic, real world environment, are vulnerable to adversarial activity, which causes the data distribution to change over time. These changes are traditionally referred to as concept drift, and several approaches have been developed in literature to deal with the problem of drift handling and detection. However, most concept drift handling techniques, approach it as a domain independent task, to make them applicable to a wide gamut of reactive systems. These techniques were developed from an adversarial agnostic perspective, where they are naive and assume that drift is a benign change, which can be fixed by updating the model. However, this is not the case when an active adversary is trying to evade the deployed classification system. In such an environment, the properties of concept drift are unique, as the drift is intended to degrade the system and at the same time designed to avoid detection by traditional concept drift detection techniques. This special category of drift is termed as adversarial drift, and this paper analyzes its characteristics and impact, in a streaming environment. A novel framework for dealing with adversarial concept drift is proposed, called the Predict-Detect streaming framework. This framework uses adversarial forethought, and incorporates the context of classification into the drift detection task, to provide leverage in dynamic-adversarial domains. Experimental evaluation of the framework, on generated adversarial drifting data streams, demonstrates that this framework is able to provide reliable unsupervised indication of drift, and is able to recover from drifts swiftly. While traditional partially labeled concept drift detection methodologies fail to detect adversarial drifts, the proposed framework is able to detect such drifts and operates with <6% labeled data, on average. Also, the framework provides benefits for active learning over imbalanced data streams, by innately providing for feature space honeypots, where minority class adversarial samples may be captured. The framework provides for an application independent, distribution independent, incremental, and semi supervised system for continuously dealing with adversarial activity at test time, and provides a generic way for implementing reactive security to classification based systems.
منابع مشابه
A Dynamic-Adversarial Mining Approach to the Security of Machine Learning
Operating in a dynamic real world environment requires a forward thinking and adversarial aware design for classifiers, beyond fitting the model to the training data. In such scenarios, it is necessary to make classifiers a) harder to evade, b) easier to detect changes in the data distribution over time, and c) be able to retrain and recover from model degradation. While most works in the secur...
متن کاملIncremental Option Trees for Handling Gradual Concept Drift
Data streams are inherently time-varying and exhibit various types of dynamics. The presence of concept drift in the data significantly influences the accuracy of the learner, thus efficient handling of non-stationarity is an important problem. In this paper, we address the problem of modeling the transition phases between consecutive concepts in gradual concept drift. In those transition phase...
متن کاملPredictive Analytics on Evolving Data Streams
Ever increasing volumes of sensor readings, transactional records, web data and event logs call for next generation of big data mining technology providing effective and efficient tools for making use of the streaming data. Predictive analytics on data streams is actively studied in research communities and used in the real-world applications that in turn put in the spotlight several important ...
متن کاملEnhanced Decision Tree Algorithm for Data Streams using adaptation of Concept Drift
Construction of a decision tree is a well researched problem in data mining. Mining of streaming data is a very useful and necessary application. Algorithms such as VFDT and CVFDT are used for decision tree construction, but as a lot of new examples are added, a new optimal model needs to be constructed. Here in this paper, we have provided an algorithm for decision tree construction which uses...
متن کاملDetecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Expert Syst. Appl.
دوره 97 شماره
صفحات -
تاریخ انتشار 2018